MLB expansion is seemingly more and more likely with each passing year. Not only has Rob Manfred stated that expansion is a goal of his before he retires, the current MLB collective bargaining agreement explicitly allows for an expansion to 32 teams. Given this information, we decided to make a model that aims to create a 32 team version of MLB that minimizes the haversine distance between teams that share a division, all while preserving the historic rivalries and league affiliation that make MLB special. This project utilizes knowledge of index linear programming and skill with the AMPL coding langue.
Originally, the goal for the project was to make a model that would attempt to optimize a 32 team, four division, two league version of MLB, while attempting to find the ideal location for two new expansion franchises. However, we soon realized that having just two divisions resulted in teams from different sides of North America sharing divisions. We then decided to make a additonal model that calculated an optimal league structure for a 32 team, eight division, two league MLB.
In order to fill the data for our model, we used Google Maps location data to find the latitude and longitude of each MLB stadium. For the possible expansion teams, we settled on Nashville, Tennessee, Portland, Oregon, Salt Lake City, Utah, and Montreal, Quebec, Canada, and took the latitude and longitude for those cites to use in the model.
When figuring out how to calculate the haversine distance between teams, there were two main problems. How to actually calculate the distance between each team, and how to make sure that the distance would only be considered if the teams in question shared a division. In order to do this, our minimization model had an objective function that looked like this:
“minimize havdistance: sum{x in 1..34, n in 1..4} ((1-(cos(TeamLatitude[x]) - cos(TeamLatitude[35-x])) + cos(TeamLatitude[x]) * cos(TeamLatitude[35-x]) * (1-(cos(TeamLongitude[x]) - cos(TeamLongitude[35-x])))) / 2)*(-1+(A[x,n]+A[35-x,n]))”
Because we had 34 hypothetical teams to consider including in the model, we used A[x,n] - A[35-x,n] whenever calculating the delta between the latitude of teams, because this would ensure that every possible pairing of teams would have their distance considered. We then multiplied that by -1 +(A[x,n]+A[35-x,n]). This is because A[x,n] would always equal one while the model is running it’s initial calculations to measure the distances between teams. Therefore, if we didn’t subtract one from it, the distance between every single pairing of teams would be considered in the model. However, if the team represented by A[x,n] and A[35-x,n] shared a division in the calculation of the model, adding those two together would result in a value of two. Thereby making it so that the distance between those two teams would be multiplied by one when subtracting one from them in the formula, and ensuring that only the distance between that paring of teams was considered in the summation of the optimization function.
In order for the model to ensure that each team currently in MLB was included in the model, we included the following constraint for the value of one through 30 for x:
“Subject to Team_Assignment: sum{n in 1..4} A[x,n] = 1”
This ensured that not only did every team currently in MLB get included in the model, but that they were assigned to just one division, not spread across multiple. But this wouldn’t work when it came to our possible expansion cities, because we only wanted to include two of them. So we included the following constraints:
“subject to Team_Assignment_31 : sum{n in 1..4} A[31,n] <= 1;
subject to Team_Assignment_32 : sum{n in 1..4} A[32,n] <= 1;
subject to Team_Assignment_33 : sum{n in 1..4} A[33,n] <= 1;
subject to Team_Assignment_34 : sum{n in 1..4} A[34,n] <= 1;
subject to newteamassignment : sum{n in 1..4} A[31,n] + sum{n in 1..4} A[32,n] + sum{n in 1..4} A[33,n] + sum{n in 1..4} A[34,n] >= 2;”
This allowed expansion cities to not be included in the model, while still ensuring that at least two of the possible cities were included.
After reflection, I believe there was one fatal flaw in our models. In a competitive league like MLB, it is crucial that no team must deal with a significantly larger amount of travel than any other. Unfortunately, because our model sought to minimize total haversine distance between teams, the delta between the smallest average team travel distance and the largest average team travel distance was quite high, a result of the model grouping a handful of teams that were extremely close together with one team that was much further away in its divisions, resulting in one team having a much higher average travel distance than the rest. In the future I would have my objective function minimize average team travel distance instead of minimizing the sum team travel distance.